Tools for Monitoring and Controlling Distributed Applications
نویسندگان
چکیده
The Meta system is a UNiX-based toolkit that assists in the construction of reliable reactive systems, such as distributed monitoring and debugging systems, tool integration systems and reliable distributed applications. Meta provides mechanisms for instrumenting a distributed application and the environment in which it executes, and Meta supplies a service that can be used to monitor and control such an instrumented application. The Meta toolkit is built on top of the Isls toolkit; they can be used together in order to build fault-tolerant and adaptive distributed applications. "Thiswork was supportedby the DefenseAdvanced ResearchProjectsAgency (DoD) under NASA Ames $,rLntnumber NAG 2-593,ContractN00140-87-C-8904.The views, opinions,sad f[ndin_containedinthisreportare thoseofthe authorsand shouldnot be construedasan official Department ofDefenseposition, policy,ordecision. This work was alsopartially supportedby a grantfrom Xerox. tTh_s amhor was also partially supported by a G.T.E. Graduate Student Fellowship. 1 Constructing Reactive Systems In a reactivesystem architecture, the system ispartitionedintotwo pieces: an environment that followsa basiccourse of action,and a controlprogram thatmonitors the stateof the environment in order to influencethe environment'sprogress.This architecture isverygeneral.For example, processcontrolsystems,system monitors and debuggers,and toolintegration servicesallhave a reactivesystem structure. Another applicationofthereactivesystem architecture isthe structuring of distributedapplications.For example, many distributedapplicationsare constructedby takingoff-the-shelf programs and connectingthem with some communication subsystem. Such an applicationcan be thought of as an "environment" with a stateincludingthe propertiesof machines running the application,currentperformance of the component programs, and the stateof the communication subsystem. The job of the controlprogram is to monitor the stateof the applicationin order to guarantee that the system operatesefficiently inspiteofchanging loadand failures. The control program can alsobe used to interconnectthe application's components ina more looselybound manner than conventionalRPC mechanisms. The Meta system,describedin thispaper,isa UNIX1-based toolkitthat providesthe basicprimitivesneeded to build a non-real-timereactivesystem. Using the toolkit,a distributedprogram can be instrumented with sensorsand actuatorsin order to expose itsstateforpurposes of control. Meta providesmechan.ismsthat allowa controlprogram to query the state of the instrumented applicationand to respond by invokingactuatorswhen some conditionofinterestoccurs.The toolkitincludesfacilities forstructuringindividualcomponents intocollections ofcomponents forfault.tolerance. In addition,Meta guaranteesthatthe monitoringand reactionisdone atomically. Meta itself isbuilton top of anothertoolkit,the !slssystem. The applicationdesignercan use Islsforfault-tolerant communication and Meta for distributedcontrol In fact,the Meta projectwas startedwhen fourof us in the IslSprojectworked on integratinga distributedapplicationconstructed from off-the-shelf components [MCWB90]. The facility we found lackingin Islswu support fordistributedcontrol. The next sectionintroducesthe architectureof an applicationmanaged by Meta. Section3 presentshow applicationsare instrumented,and SecUNIX is a trlgiematk of A.T.&T. tion 4 discusses how the resulting application is controUed. Finally, Section 5 presents the current status of Meta and discusses our future plans. 2 The Meta Architecture The architecture of Meta can be illustrated through an example of managing a distributed application. Consider an application that includes services and clientsmaking use of the services.A given serviceconsistsof a set of identicalserversreplicatedboth for fault-tolerance and for coarse-grained parallelism.Meta willbe used to manage the services;in particular,if the load on a serviceis too largeor the number of serversbecomes too small due to crashes,then a new serveris to be startedand added to the service. Additionally,ifa server'squeue becomes too long, then waiting requestsare to be migrated to less-loadedserversin the service.There are other conditionsthat would probably need to be maintained as well,such as reducingthe number of serverswhen appropriate,but forsakeof brevity we willkeep our example limited. Meta structures a distributed application Using a data model based on the entity-relation data model [Che76], with each instrumented component (i.e., a program equipped with sensors and actuators) being viewed as an entity and its sensors and actuators being the attributes of that entity. For example, a server in the above example could be instrumented with sensors that give the server's load and the queue of waiting requests. Entities of the same type, that is, having the same set of sensor and actuator attributes. form an entity set. Subsets of an entity set may be grouped together to form aggregates. Aggregate structures provide control programs with a way of grouping related entitiestogetherand limitingactionsto members of that group. For ex_ple, the serversc0mp_sing=aservice can be grouped intoan aggregate representingthe service.Aggregates are themselves entities, and the system architectcan definesensorsand actuatorson aggregates.An aggregate sensorisa functionover the stateof allthe members of the aggregate.For example, a service=a_egate couldhave a sensorthatgivesthe median queue lengthof the serversinthe service.An a_regate actuatorcausesan action to be performed on some subset(from one to all)of the currentmembers. A distributed application is managed through the use of guarded commands; that is, through a set of (condition, act/on) p_s that reference the sensors and actuators of the instrumented application. These commands are executed by interpreters thatresidein 8tubs(somewhat likeRPC stubs) coresidentwith the instrumented programs, thus allowingfor fastnotificationand reaction.Each conditionisa propositionon the stateof system; referencesto both localsensors--withinthe entityto which the stub isattached--and non.localsensorsare allowed.The actionportionisa sequence of actuatorinvocationsthat are executed atomically.Actions may enableguarded commands on another Meta stub;thisfacility allowsone to writecontrolprograms thatspan multiplecomponents. Sinceguarded commands areevaluatedin the same addressspace as an instrumented program, theirimpact on the performance of the application is a concern. The syntax of the guarded command language (a postfix language calledNPL) istailoredfor fastand ef_cientevaluation,and so we do not expect programs to be writtendirectlyin thislanguage. _Ve are designingan object-oriented controllanguage calledLom{ta [MCWBg0] that can be used to describethe structureof the applicationand to specify itscontrolbehavior. A Lomita program containsa schema specifyingthe entityand aggregatestructurealong with theirsensorsand actuators.The controlbehaviorof the applicationisspecified in Lomita through the use of rules,where the conditionsforthe rulemay includereal-timeintervalogic expressions[SMSV83]. Such temporal expressionsare compiled intofinite state automata, where the state transitionsare implemented using _leca guarded commands. Figure 1illustrates the useofstubs.The machine M1 isrunning a server that has been instrumented,so thereisa stub running in the same address space as thisserverthatcan directlyaccessthe sensorsand actuatorsof the server.The machine isalsorunning a separateMeta-suppliedprogram accessingthe variouspropertiesofthe machine and itsoperating'system,such as the amount ofavailablememory and the processorload.This program is instrumented,and so has a stub thatsupportsa setofsensorsand actuators over the machine and operatingsystem state. 3 Application Instrumentation An applicationfirstmust be instrumentedbeforeitcan be controlled.This isaccomplished by insertinginto the applicationa small amount of code, and then linkingthe applicationwith a Meta library.This sectiondescribes the instrumentationprocessinmore detail.
منابع مشابه
Design and Implementation of Dynascope, a Directing Platform for Compiled Programs 1
Debugging and performance measurement tools are becoming increasingly important , especially in distributed and parallel computer systems. A characteristic task of debuggers and performance measurement tools is directing. Directing consists of two major classes of activities, monitoring and controlling. Monitoring is used for collecting information about the program behavior. Controlling is use...
متن کاملImplementation of Directing for Compiled Programs
Debugging and performance measurement tools are becoming increasingly important , especially in distributed and parallel computer systems. These tools are complex, because they are dependent on all major components of computer systems: computer architectures, operating systems, networking, and existing programming tools. The complexity is compounded by the heterogeneous nature of most computing...
متن کاملNanobiosensors-their Applications in the Medicinal Plants Industry
For centuries, herbal drugs have been the only accessible resource for treatment of pain and passions. Today, despite remarkable progress and development of synthetic drugs, medicinal plants and their derived drugs are used massively. So that, in some countries medicinal plants is inseparable from drugs and treatment systems. More ever, their marketing and economical aspects are more flourishin...
متن کاملMulti-layer Monitoring in Distributed Object-environments
This paper presents an on-line monitoring concept for distributed objectenvironments. On-line monitoring systems provide a powerful facility for realizing tools for the development and deployment of distributed applications. A major aspect in this context is the construction of interoperable tools capable of handling the complexity of large heterogeneous environments. Based on the requirements ...
متن کاملMonitoring and Visualizing Software-heterogeneous Distributed Object Applications
Distributed applications composed of many cooperating components, written in diierent computing environments , that will become more common in the near future, increase substantially demands for monitoring and visualization tools. Complexity and the non-deterministic nature of such programs cause that information about them is diicult to manage, store, and visualize. In this paper solutions to ...
متن کاملA Monitoring Toolset for Paose
Paose (Petri net-based Agent-Oriented Software Engineering) combines the paradigm of AOSE (Agent-Oriented Software Engineering, see [10]) with the expressive power of Petri nets – reference nets [12] to be more precise. While AOSE is a powerful approach when it comes to designing and developing distributed (agent) applications, it does not address the problems specific to debugging, monitoring,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1991